Data-Driven Joint Debugging of the DBpedia Mappings and Ontology - Towards Addressing the Causes Instead of the Symptoms of Data Quality in DBpedia
نویسنده
چکیده
DBpedia is a large-scale, cross-domain knowledge graph extracted from Wikipedia. For the extraction, crowd-sourced mappings from Wikipedia infoboxes to the DBpedia ontology are utilized. In this process, different problems may arise: users may create wrong and/or inconsistent mappings, use the ontology in an unforeseen way, or change the ontology without considering all possible consequences. In this paper, we present a data-driven approach to discover problems in mappings as well as in the ontology and its usage in a joint, data-driven process. We show both quantitative and qualitative results about the problems identified, and derive proposals for altering mappings and refactoring the DBpedia ontology.
منابع مشابه
DBpedia Mappings Quality Assessment
The root of schema violations for rdf data generated from (semi-)structured data, often derives from mappings, which are repeatedly applied and specify how an rdf dataset is generated. The dbpedia dataset, which derives from Wikipedia infoboxes, is no exception. To mitigate the violations, we proposed in previous work to validate the mappings which generate the data, instead of validating the g...
متن کاملUpdating Wikipedia via DBpedia Mappings and SPARQL
DBpedia crystallized most of the concepts of the Semantic Web using simple mappings to convert Wikipedia articles (i.e., infoboxes and tables) to RDF data. This “semantic view” of wiki content has rapidly become the focal point of the Linked Open Data cloud, but its impact on the original Wikipedia source is limited. In particular, little attention has been paid to the benefits that the semanti...
متن کاملTest-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
rdf dataset quality assessment is currently performed primarily after data is published. Incorporating its results, by applying corresponding adjustments to the dataset, happens manually and occurs rarely. In the case of (semi-)structured data (e.g., csv, xml), the root of the violations often derives from the mappings that specify how the rdf dataset will be generated. Thus, we suggest shiftin...
متن کاملExtending DBpedia with Wikipedia List Pages
Thanks to its wide coverage and general-purpose ontology, DBpedia is a prominent dataset in the Linked Open Data cloud. DBpedia’s content is harvested from Wikipedia’s infoboxes, based on manually created mappings. In this paper, we explore the use of a promising source of knowledge for extending DBpedia, i.e., Wikipedia’s list pages. We discuss how a combination of frequent pattern mining and ...
متن کاملAssessing and Refining Mappings to RDF to Improve Dataset Quality
rdf dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually –but rarely– applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the rdf dataset will be genera...
متن کامل